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ABSTRACT 

Under  certain  conditions  the  state  space  of  a  discrete  parameter 
Markov  Chain  may  be  partitioned  to  form  a  smaller  "lumped"  chain  that 
retains  the  Markov  property.   The  problem  of  formulating  lumpability  hy- 
potheses when  the  transition  probability  matrix  P   is  not  known  and, 
hence,  must  be  estimated  is  discussed.   An  approximate  test  of  these  hy- 
potheses is  described  based  on  well  known  non-parametric  methods.   The 
procedure  is  illustrated  by  an  example. 


Prepared  by: 


INTRODUCTION 

Under  certain  conditions  the  state  space  of  a  discrete  parameter 
Markov  Chain  may  be  partitioned  into  subsets  of  states  each  of  which 
may  be  treated  as  a  single  state  of  a  smaller  chain  that  retains  the 
Markov  property.   Such  a  chain  is  said  to  be  "lumpable"  or  "weakly 
lumpable",  depending  upon  conditions,  and  elsewhere  has  been  referred 
to  as  a  "mergable  process"  [3]  and  a  "chain  with  collapsed  states" 
[2],   The  resulting  smaller  chain  is  called  a  "lumped  chain";  conditions 
allowing  lumping  are  discussed  in  detail  by  Kemeny  and  Snell  [4].   A 
practical  problem  arises  in  examining  lumpability  conditions  when  the 
matrix  of  transition  probabilities  P   is  not  known,  but  a  number  of 
transitions  have  been  observed.   Billingsley  [1]  discusses  related 
problems  of  statistical  inference  for  Markov  Chains,  but  statistical 
considerations  of  lumpability  have  apparently  not  been  investigated. 

In  particular,  we  consider  an  aperiodic  Markov  Chain 
{X  :   t  =  0,  1,  2,  ...}   with  finite  state  space  S  =  {1,  ...,  n} 
and  stationary  transition  probability  matrix  P  =  [p .  . ] .   It  is  con- 
venient to  restrict  ourselves  to  the  case  where   {X  }   is  irreducible. 
Thus,  the  chain  is  described  by  P ,   a  vector  of  steady  state  proba- 
bilities  it  =  (it.  ,  .  .  .  ,  it  )  ,   and  possibly  an  a  priori  distribution 
1        n  — 

of  initial  states,   p  .   Given  observations  of   k   transitions  of  this 

chain  we  obtain  a  matrix  of  transition  counts   [n..]   where  n    is 

the  number  of  transitions  into  state  j   from  state  i,   and,  of  course, 

n   n 

I        J"   n..  =  k. 


Maximum  likelihood  estimators  for  the  one-step  transition  probabilities 
are  given  in  [1] 


P 


"il7^  nii  =  nii/ni.'  (1) 


ij    ij'.i!   ij    ij 


J-l 

where  n  ^   is  the  observed  frequency  with  which  the  process  visited 
state   i.   If  a  lumpability  hypothesis  (which  we  discuss  below)  was 

formulated  independent  of  the  sample  of  k   transitions,  it  could  be 

2 
tested  in  terms  of  the  asymptotic   x  -theory   involving  differences 

between  observed  and  expected  frequencies  [1],  [6].   More  often  in 

practice,  however,  the  hypothesis  to  be  tested  is  suggested  by  the 

sample  of  observed  transitions.   Technically,  this  gives  rise  to  a 

problem  of  the  simultaneous  inference  type  [5];  for  "large"  samples, 

the  magnitude  of  the  error  (usually  lower  than  the  desired  size  and 

power)  is  insignificant.   Thus,  in  approximate  tests,  based  on  asymptotic 

distributions  of  the  test  statistics,  the  effect  of  formulating  the 

hypotheses  to  be  tested  with  the  aid  of  the  data  to  be  used  for  the 

test  is  usually  ignored.   In  what  follows  we  shall  discuss  the  use  of 

2 
asymptotic  x  -theory   in  testing  hypotheses  of  lumpability.   We  begin 

under  the  assumption  that  the  null  hypothesis  has  been  determined, 

perhaps  through  reasoning  about  the  physical  system  being  modeled,  or 

perhaps  through  preliminary  examination  of  data  from  {X  }.   Later,  in 

section  4,  we  make  some  comments  about  the  problem  of  hypothesis 

formulation. 

1.   LUMPABILITY  HYPOTHESES 


Consider  an  n-state  Markov  Chain  {X  :   t  ■  0,  1,  2,  ...} 


Formally,  we  have  the  following: 

Definition:  (%■   }      is  lumpable  with  respect  to  a  partition  's  - 

{L.  ,  L  ,  . ..,  L  }   of   S,   where  m  <  n,   if  for  every  initial  state 
1    2.  m  ' 

probability  vector  p    the  resulting  chain  (x    }   is  Markovian  and 

***  *^0 

the  transition  probabilities  p    are  invariant  under  choices  of  p  . 

A  necessary  and  sufficient  condition  for   {X  }   to  be  lumpable  with 
respect  to  a  partition  S   of   S   is  that  for  each  pair   (L . ,  L.), 
the  probability  of  transition  from  k   to  some  I   £  L.   is  the  same 
for  each  k  €  L.   (Theorem  6.3.2  [A]).   We  shall  use  this  character- 
ization in  stating  hypotheses  of  lumpability.   The  resulting  lumped 

chain  {X  }  will  be  Markovian  with  transition  probabilities  p.., 
t  ij 

where,  for  each  k  €  L .  * 

p..    =     I       p,  p    ;  i,   j   =   1,    ...,   m.  (2) 

1J         J£L. 
J 

The  steady  state  probability  vector    ~  =    (tt    ,    .  . . ,    tt   )      of      (x   }     has 
components 

~     =      I     tt      ;  j    =   1,    .  ..,   m, 

3 

~0  0 

and  the  corresponding  prior  p   is  similarly  determined  from  p   by 

pooling  over  the  states  in  the  L.'s. 

To  illustrate  a  lumpable  Markov  Chain,  consider  a  5-state  chain 

with  transition  probabilities   [p..:   (i,  j)  =  1,  . . . ,  5] •   Suppose  that 

this  chain  is  lumpable  into  1}  =  {{1},  {2,  4},  {3,  5}}  =  {1^,  h^    1^}. 

Then  the  transition  probability  matrix  for   {x  }   is  given  by 


Pll  P12  +  PlA  P13  +  P15 


P21  P22  +  P2A  P23  +  P25 


p31  P32  +  P3A  p33  +  p35 


It  follows  from  (2)  that 


(3) 


P21  =  PA1'  P22  +  P2A  =  PA2  +  PAA'  P23  +  P25  =  PA3  +  PA5' 
p31  =  p51'  p32  +  P3A  =  p52  +  P5A*  p33  +  p35  =  p53  +  p55' 


(A) 


Here,  ^-t^     will  be  Markovian  for  an  arbitrary  choice  of  initial  state 
probability  vector.   Burke  and  Rosenblatt  [2]  give  weaker  lumpability 
conditions  that  apply  whenever  there  exists  at  least  one  choice  of 
p   such  that  (x   }   is  Markovian.   In  either  case,  in  practice  one 
makes  conjectures  (in  the  form  of  hypotheses)  about  combining  certain 
states,  which  result  in  forming  postulated  probability  transition 
matrices  (of  lumped  chains) ,  which  in  turn  satisfy  conditions  such  as 
those  in  (3)  characterizing  lumpability  into  these  combined  states. 

2.   TEST  OF  LUMPABILITY 


Let  us  denote  the  hypothesis  that   {X  }   is  lumpable  into  S  = 
{L..  ,  ...,  L  }  by  the  partition  S   itself,  and  suppose  we  take  as  the 
alternate  the  composite  hypothesis  that  {X  }  is  not  lumpable  into  S: 

Hn:   {Ln ,  ...,  L  }  v.s.   H  :   not  -  {Ln ,  ...,  L  }. 
0     1m         a  1       m 


With  the  characterization  of  lumpability  discussed  above,   HQ   is 
equivalent  to  stating  that,  in  addition  to  satisfying  the  conditions 
of  a  stochastic  matrix,   [p..]   satisfies  conditions  (2). 


The  random  variable 

/  N2 

n   n   (n   -  n.   p. .) 

I      l    -^ k_lLl_  (5) 

1=1  J-l    V  pu 

2 
is  asymptotically  (as   k  ■*  °°)   x  -distributed  with  n(n  -  1)   degrees 

of  freedom  ([1],  Theorem  5.3).   However,  the  p..   are  unknown,  so  we 

must  apply  the  well  known  procedure  of  reducing  degrees  of  freedom  to 

account  for  estimation  of  parameters  in  (5).   For  example,  Roy  [6] 

(Theorem  5,  page  126)  states  the  rule  in  a  form  appropriate  for  the 

current  context.   We  take  the  random  variable  (5),  with  the  p..'s 

replaced  by  the  corresponding  maximum  likelihood  estimators   p..,   as 

the  test  statistic.   H   is  rejected  if  the  calculated  value  of  this 

2 
statistic  falls  above  the  tabulated  x   quantile  corresponding  to 

the  desired  level  of  significance,   a.   Note  that  in  calculating  the 

p..   we  must  use  the  constraints,  such  as  those  given  in  (A)  for  our 

example,  corresponding  to  H-.   In  addition,  we  must  use  the  constraints 

to  determine  the  appropriate  reduction  in  the'  degrees  of  freedom. 

For  a  given  null  hypothesis   {L  ,  ...,  L  },   let   X.   denote  the 

number  of  the  original  n  states  present  in  L..   By  proper  initial 

choice  of  labels  for  the  states  in  S,   it  is  possible  to  state  the 

null  hypothesis  in  the  form 

HQ:   S  =  {"{I,  2,  ....  A1>,  {\1   +  1,  ...,  X1  +   X2),  ..., 

m-1 
{  I      X.    +  1,  ...,  n}}. 
1   3 

2 

We  wish  to  estimate  the  ri   parameters  p..   subject  to 

n 

I     p   -  1  ;   i  =  1,  2,  . . . ,  n 

j-l   3 


(n  constraints) ,  and 


I       Pi-j  ■  I       Pi-.j  ;   i"  +  i"  both  in  l±; 


1,  2 i,   (6) 


J   s        J   s 
m 
adding   £   (\.  -  l)»m  ■  m(n  -  m)   constraints.   Thus,  using  Roy's  rule- 

i-1 
of-thumb,  the  number  of  "independent"  parameters  we  need  to  estimate  is 

2 
n  -  n1  -  m(n  -  m) ,   so  the  degrees  of  freedom  of  the  test  statistic  is 

2     2 
simply  n  -  [n  -  n  -  m(n  -  m) J  =  n  +  m(n  -  m) . 

The'  maximum  likelihood  estimators   p , .   of  the  p . .   under  the 
above  constraints  can  be  derived  using  Lagrangian  multipliers  with  the 

log  likelihood  function   /  n . .  log  p...   The  form  of  these  estimators 

i»  J 
have  the  following  intuitively  appealing  interpretation:   suppose 


k  6  L ,   and  q  €  L  ,   in  order  to  estimate  p,  ,   first  form  a  maximum 
i  s  kq 

likelihood  estimate  of  £  p,  . ,  where  L  contains  q.  By  equation 
(6) ,  it  is  not  surprising  that  this  estimate  turns  out  to  be  a  "pooled" 
estimate, 


J   s 


I      I    v 

k€L.  j€L   KJ 
l  J   s 

I    v 

k£L.  K 


(7) 


where,  as  before,  n   =  J  n  . .   The  proper  allocation  of  the  combined 

j   J 

estimate  (7)  over  the  individual  cells  p,  . ,   for  each  j  €  L  ,   is 

kj  s 

obtained  by  weighting  (J)  by  the  relative  frequencies  n,  ./  £   n,  ,. 

3   j€Ls  kJ 
The  maximum  likelihood  estimates  of  the  p,    are  thus  given  by 


I  I       K< 


k=L,  j6L 
i  J   s 


kj 


kq 


I       nv. 


kcL 


\c 


I      %i 


J6L 


(8) 


Replacing  the  unknown  p    in  (5)  by  their  estimates   p    given  above 

2 
results  in  a  test  statistic  which  is  distributed  approximately   X 

with  n+  m(n  -  m)  degrees  of  freedom. 

In  summary,  the  procedure  for  conducting  a  test  of  the  hypothesis 

S   of  lumpability,  at  approximately  the   a-level   of  significance,  is  as 

follows : 

1.  Use  the  observed  record   {xn ,  x_,  . ..,  x  ,,}   to  form  the 

1   2        rt+1 

transition  frequency  matrix   (n  )  . 

2.  Compute  the  estimates  p , .   given  in  (8) . 

3.  Calculate  the  value  of  the  test  statistic  (5),  with  p.. 
in  place  of  the  unknown  p . . . 

4.  Reject  the  hypothesis  of  lumpability  if  the  calculated 

value  of  the  test  statistic  exceeds  the  tabulated   (1  -  a) th  quantile 

2 
of  the   x  -distribution  with  n+  m(n  -  m)   degrees  of  freedom. 


3.   A  NUMERICAL  EXAMPLE 


Consider  a  special  case  of  our  earlier  example,  where 


3  .1  .2  .1  .3 

1  .3  .1  .3  .2 

5   .1  0  .1  .3 

1  .5  .2  .1  .1 

5   0  .1  .2  .2 

with  'S  -  {{1},  {2,  4},  {3,  5}},  so 

.3  .2  .5 

.1  .6  •  .3 

.5  .2  .3 


We  generated  1000  transitions  with  P,   using  a  table  of  random  numbers, 
resulting  in  the  frequency  matrix 


84 

31 

52 

31 

112 

22 

46 

13 

54 

33 

(n±j)  - 

69 

9 

0 

13 

31 

14 

83 

33 

23 

16 

118 

0 

23 

48 

42 

Imagine  P   is  unknown,  and  we  wish  to  use  the  data  in   (n  .)   to  test 
Hn:   S.   The  usual  (without  lumpability  constraints)  maximum  likelihood 
estimate  of  P   is  given  by 


.27 

.10 

.17 

.10 

.36 

.13 

.27 

.08 

.32 

.20 

p  = 

.57 

.07 

0. 

.11 

.25 

.08 

.49 

.20 

.14 

.09 

.51 

0. 

.10 

.21 

.18 

Under  the  hypothesized  lumpability  conditions,  the  matrix  of  estimates 


(P±j)   is 


p  = 


.270  .100  .170  .100  .360 

.107  .281  .080  .330  .202 

.530  .081  0.  .117  .272 

.107  .479  .190  .133  .091 

.530  0.  .096  .198  .176 


The  value  of  the  test  statistic  is  3.03,  which  falls  well  below  the 

2 
a  -  .05   x   critical  value  19.68  with  5  +  3(5  -  3)  =  11  degrees  of 

freedom.   We  would  thus  conclude  the  observed  data  is  consistent  with 

the  hypothesis  of  lumpability,  in  the  sense  that  the  test  value  is  not 

significant  at  the  .05  level.   Of  course,  in  this  case  with  P   known, 

HQ  is  known  to  be  true;  the  "test"  is  simply  an  illustration  of  how 

we  would  have  proceeded  if  P  had  not  been  known. 

4 .   COMMENTS 

We  have  discussed  a  test  of  a  given  lumpability  hypothesis;  the 
problem  of  using  the  observed  data  both  to  formulate  the  hypothesis  as 
well  as  test  it  has  been  mentioned.   Even  if  one  disregards  this  problem, 
there  is  a  very  significant  problem  in  how  to  use  the  data  to  formulate 
appropriate  hypotheses,  A  solution  of  this  problem  would  be  of  great 
interest,  for  example,  in  large  computer  based  information  systems, 
where  man's  intuition  is  not  sufficient  to  cope  with  the  range  of 
possible  alternatives. 


10 
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